Intro to python - Basics - 1

One should look for what is and not what he thinks should be. (Albert Einstein)

Basics: Topic introduction

In this part of the course, we will cover the following concepts:

  • Data Science industry overview
  • Python as a programming language and the tools used to write and execute Python code
  • Basic operations and data types

Module completion checklist

Objective Complete
Discuss how programming is used across industries and define core functions of data scientists
Explain the data science life cycle and ways to use predictive modeling
Summarize data science use cases for Python

Why are we learning to program?

I’m a […] major, why do I need to learn programming?

centered

  • Programming is becoming a more universal skill like typing in Word, or making eye-catching presentations in PowerPoint in the 90’s or early 2000’s
  • Programming facilitates performing the same operations on a large scale multiple times
  • Programming is a necessary component of reproducible research
  • Programming makes you think through your problem from a very specific viewpoint that requires clear formulation of your end goal and methods you would like to utilize
  • The list goes on …

What level of proficiency do I need?

  • To use programming as a tool in your professional toolkit, you don’t need to be a computer scientist or have a similar level of knowledge as one

  • The level of proficiency will depend on

    • the problems you are trying to solve on daily basis
    • the subject matter area you are in
    • the level of sophistication of the solutions you would like to implement
  • Most of the time, people who are subject matter experts who also use various programming tools and languages are known as data analysts or data scientists


What are the problems you are trying to solve? What is your area of expertise? What level of complexity would you like your programmatic solution to have?

A data scientist can

  1. Pose the right question
  2. Wrangle the data (gather, clean, and sample data to get a suitable dataset)
  3. Manage the data for easy access by the organization
  4. Explore the data to generate a hypothesis
  5. Make predictions using statistical methods such as regression and classification
  6. Communicate the results using visualizations, presentations, and products

    centered-border

What do data scientists do?

Use programming languages and tools to

  1. Wrangle the data (gather, clean, and sample data to get a suitable dataset)
  2. Manage the data for easy access by the organization
  3. Explore the data to generate a hypothesis
  4. Make predictions using statistical methods such as regression and classification

Stemming from the list above, the programming skills should cover knowing a programming language (or two, or three, or …) to a degree that allows you to perform these operations!

Module completion checklist

Objective Complete
Discuss how programming is used across industries and define core functions of data scientists

✔

Explain the data science life cycle and ways to use predictive modeling
Summarize data science use cases for Python

Data science control cycle: framework for data

  • There is a protocol or standard for working with data that most data scientists follow
  • The cycle involves everything from asking the right questions and being knowledgeable about the data you’re studying, to optimizing your model’s performance

Data Science Control Cycle (DSCC)

centered

Question - Which part of the cycle do you think takes up the most time?

  1. Data Cleaning and Collection
  2. Data Analyzing and Modelling
  3. Learning New Techniques

How you think data scientists spend their time?

centered

How data scientists actually spend their time

centered

DSCC: SMART questions

SPECIFIC

  • How are you framing the question?
  • What specific variables?

MEASURABLE

  • What metrics are you using?
  • What is the success criteria?

ACHIEVABLE

  • Scope your analysis well
  • Use data that is available to you

RELEVANT

  • Who will use this analysis?
  • Is it interesting or usable?

TIMEBOUND

  • Reference time frame of analysis
  • If predicting, in next year? next month? ever?

DSCC: research

centered

  • Data is key to quality results
  • Garbage in - garbage out is the famous programming mantra that stands true for data science as well!
  • It should always be on your mind when working with any dataset
  • Whether it is suitability of the data for your research or its quality, it must never be overlooked

DSCC: modeling

centered


  • Model by definition is a replica of a real thing
  • Select a model that suits your problem/data or simulates the real-life situation in the closest possible way

centered

DSCC: steps 4 - 5

centered

  • We all have prior knowledge that sometimes makes us pre-conditioned to make incorrect assumptions
  • Don’t let your extensive experience get in the way, always start as if you know nothing about the problem!

centered

  • Validate and test your assumptions before delivering results to stakeholders!
  • Adjust the model if necessary
  • Have you ever made incorrect assumptions about a problem you were trying to solve?

DSCC: step 6

centered

  • Interpretation of the results is as important as the results themselves
  • Use your best judgment and expertise to deliver actual information that the data carries to stakeholders
  • Make your conclusions actionable, so that stakeholders know what next steps to take from looking at your results

Predictive modeling

  • What is a predictive model and why would we use one?

    • Predictive Analytics is a mathematical formula of past data to predict the outcomes in the future
    • Predictive modeling allows us to do more with less
    • Predictive modeling allows us to identify individuals who are most likely to take or not take an action

Example: campaign response rate

centered-border

Predictive Modeling: When is it not useful?

  • Predictive Modeling is not useful if you have an unlimited campaign budget and can target everyone
  • However, a predictive model provides better targeting information helping reduce the campaign budget

centered

Predictive modeling: High Level Overview


centered

Module completion checklist

Objective Complete
Discuss how programming is used across industries and define core functions of data scientists

✔

Explain the data science life cycle and ways to use predictive modeling

✔

Summarize data science use cases for Python

What is Python?

  • Python is a powerful programming language that data scientists love because of its:

    • inherent readability and simplicity compared to lower level languages
    • number of dedicated analytical libraries to work with numerical, text, and image data
    • 72,000+ libraries and growing constantly
  • Click here for more about Python

centered-border

What can you do with Python?

Natual Language Processing

  • Sentiment analysis
  • Twitter analysis using live Twitter feeds

Deep Learning

  • Object recognition
  • Facial recognition

Visualization

  • Interactive visualization deployable to websites
  • 3D visualizations

Automation, Big Data, and Predictive Modeling

  • Web scraping to automate the collection of data from websites
  • Data wrangling with fast and efficient functions
  • Big Data modeling through integration with Apache Spark
  • Machine learning on structured data
  • Data ingestion from various sources

Knowledge check

centered

Link: kc params$basics_knowledge_check_1

Module completion checklist

Objective Complete
Discuss how programming is used across industries and define core functions of data scientists

✔

Explain the data science life cycle and ways to use predictive modeling

✔

Summarize data science use cases for Python

✔

Congratulations on completing this module!

icon-left-bottom